Deep Learning of Invariant Features via Simulated Fixations in Video

نویسندگان

  • Will Y. Zou
  • Andrew Y. Ng
  • Shenghuo Zhu
  • Kai Yu
چکیده

We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classification accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Learning of Invariant Spatio-Temporal Features from Video

We present a novel hierarchical and distributed model for learning invariant spatiotemporal features from video. Our approach builds on previous deep learning methods and uses the Convolutional Restricted Boltzmann machine (CRBM) as a building block. Our model, called the Space-Time Deep Belief Network (STDBN), aggregates over both space and time in an alternating way so that higher layers capt...

متن کامل

Deep Learning for Saliency Prediction in Natural Video

The purpose of this paper is the detection of salient areas in natural video by using the new deep learning techniques. Salient patches in video frames are predicted first. Then the predicted visual fixation maps are built upon them. We design the deep architecture on the basis of CaffeNet implemented with Caffe toolkit. We show that changing the way of data selection for optimisation of networ...

متن کامل

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

DeepCAD: A Computer-Aided Diagnosis System for Mammographic Masses Using Deep Invariant Features

The development of a computer-aided diagnosis (CAD) system for differentiation between benign and malignant mammographic masses is a challenging task due to the use of extensive preand post-processing steps and ineffective features set. In this paper, a novel CAD system is proposed called DeepCAD, which uses four phases to overcome these problems. The speed-up robust features (SURF) and local b...

متن کامل

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012